Skip to content

BUG: fix DataFrame.__getitem__ and .loc with non-list listlikes #21313

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 7, 2018

Conversation

toobaz
Copy link
Member

@toobaz toobaz commented Jun 4, 2018

xref #21309 , but worth fixing separately

@jreback
Copy link
Contributor

jreback commented Jun 5, 2018

this would have to be for 0.24 (not convinced we should do this)

@toobaz
Copy link
Member Author

toobaz commented Jun 5, 2018

not convinced we should do this

Do you think we should disable listlikes in other cases? Or you don't find the discrepancy problematic?

@gfyoung gfyoung added the Indexing Related to indexing on series/frames, not to indexes themselves label Jun 6, 2018
@toobaz toobaz force-pushed the df_getitem_21294 branch from 34647a8 to f56c72f Compare June 11, 2018 20:36
@pep8speaks
Copy link

pep8speaks commented Jun 11, 2018

Hello @toobaz! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on July 07, 2018 at 08:11 Hours UTC

@toobaz toobaz force-pushed the df_getitem_21294 branch from f56c72f to 445dcdf Compare June 11, 2018 20:54
@toobaz toobaz mentioned this pull request Jun 11, 2018
@codecov
Copy link

codecov bot commented Jun 11, 2018

Codecov Report

Merging #21313 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #21313      +/-   ##
==========================================
- Coverage   91.95%   91.95%   -0.01%     
==========================================
  Files         160      160              
  Lines       49820    49818       -2     
==========================================
- Hits        45812    45809       -3     
- Misses       4008     4009       +1
Flag Coverage Δ
#multiple 90.33% <100%> (-0.01%) ⬇️
#single 42.07% <75.67%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/frame.py 97.19% <100%> (ø) ⬆️
pandas/core/sparse/frame.py 94.78% <100%> (-0.06%) ⬇️
pandas/core/internals.py 95.46% <0%> (-0.08%) ⬇️
pandas/core/indexes/category.py 97.28% <0%> (+0.27%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 620abc4...5bd6eb8. Read the comment docs.

@@ -110,6 +110,8 @@ Bug Fixes
- Bug in :meth:`Series.reset_index` where appropriate error was not raised with an invalid level name (:issue:`20925`)
- Bug in :func:`interval_range` when ``start``/``periods`` or ``end``/``periods`` are specified with float ``start`` or ``end`` (:issue:`21161`)
- Bug in :meth:`MultiIndex.set_names` where error raised for a ``MultiIndex`` with ``nlevels == 1`` (:issue:`21149`)
- Bug in :meth:`DataFrame.__getitem__` and :meth:`DataFrame.loc` which did not accept columns keys passed as non-list iterables (:issue:`21294`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be for 0.24

if self.columns.nlevels > 1:
return self._getitem_multilevel(key)
return self._get_item_cache(key)
except (ValueError, TypeError):
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what hits this exception here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of cases, for instance pd.DataFrame(index=range(3), columns=range(3))[['a', 'b']]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so a list-like key?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I guess so

return self._get_item_cache(key)
# We are left with two options: a single key, and a collection of keys,
# We interpret tuples as collections only for non-MultiIndex
coll_key = is_list_like(key) and (not isinstance(key, tuple) or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you take the inverse and name this is_single_key

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a single tuple is a single key, yes? (MI or no)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a single tuple is a single key, yes? (MI or no)?

Right, I was assuming tuples as collections were allowed, luckily they weren't


def _getitem_array(self, key):
if not coll_key:
# This test preserves #9519; the second part preserves #21309
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you give a more informative comment

elif len(key) != len(self.index):
raise ValueError('Item wrong length %d instead of %d.' %
(len(key), len(self.index)))
# check_bool_indexer will throw exception if Series key cannot
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank line here

@@ -501,9 +501,11 @@ def test_constructor_dict_of_tuples(self):
tm.assert_frame_equal(result, expected, check_dtype=False)

def test_constructor_dict_multiindex(self):
check = lambda result, expected: tm.assert_frame_equal(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you parameterize this test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not here, but I can avoid fixing the lambda

indexer = np.arange(len(df.columns))[isna(df.columns)]

if len(indexer) == 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment on each of these cases

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(done)

tm.assert_series_equal(df.iloc[:, indexer[0]],
df.loc[:, np.nan])

# multiple nans should fail
# multiple nans should result in DataFrame
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really!

@jreback jreback added this to the 0.24.0 milestone Jun 12, 2018
@toobaz toobaz force-pushed the df_getitem_21294 branch from 445dcdf to ba6c074 Compare June 12, 2018 05:55
@toobaz
Copy link
Member Author

toobaz commented Jun 12, 2018

@jreback ready for me

@jorisvandenbossche
Copy link
Member

I have my doubts we should do this, commented on the relevant issue: #21294

@toobaz
Copy link
Member Author

toobaz commented Jun 30, 2018

Discussion (I think) concluded, conflicts fixed by rebasing... ready for me. Objections @jreback ?

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@toobaz I had some comments that I didn't click on. and can you rebase on master.

if self.columns.is_unique and key in self.columns:
if self.columns.nlevels > 1:
return self._getitem_multilevel(key)
return self._get_item_cache(key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you changing to directly use
_get_item_cache here rather than _getitem_column? (is it removed)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, removed. It did a uniqueness test which is no more necessary, and was misleading anyway, as it could not really manage all cases in which a single column is returned.

if self.columns.nlevels > 1:
return self._getitem_multilevel(key)
return self._get_item_cache(key)
except (ValueError, TypeError):
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so a list-like key?

indexer = convert_to_index_sliceable(self, key)
if indexer is not None:
return self._getitem_slice(indexer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the change here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed?

Yes, removed, it was a pretty useless one-liner

is_single_key = isinstance(key, tuple) or not is_list_like(key)

if is_single_key:
if self.columns.nlevels > 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn’t this case handled by _getitem_multilevel (above)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only if columns.is_unique

if self.columns.nlevels > 1:
return self._getitem_multilevel(key)
indexer = self.columns.get_loc(key)
if is_integer(indexer):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an argument for _take to accept a scalar integer

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not too sure. _take is such a fundamental method that there might be good reasons to keep it simple. Anyway, we can discuss this (in some other issue).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moreover, the problem should disappear when we fix #9519 , that is, when the return type becomes predictable from the index (non-)uniqueness

@toobaz
Copy link
Member Author

toobaz commented Jul 4, 2018

@jreback added a comment clarifying the except clause, rebased, ready for me

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok lgtm. can you add a whatsnew, maybe needs a subsection.

@toobaz toobaz force-pushed the df_getitem_21294 branch from 9c57fd4 to 5bd6eb8 Compare July 7, 2018 08:10
@toobaz
Copy link
Member Author

toobaz commented Jul 7, 2018

ok lgtm. can you add a whatsnew, maybe needs a subsection.

I don't think a subsection is worth putting in the whatsnew - because the change is relatively marginal. But I do think we'll need to clarify list-likes in general in the docs - see #21784

@jreback jreback merged commit 32ee973 into pandas-dev:master Jul 7, 2018
@jreback
Copy link
Contributor

jreback commented Jul 7, 2018

thanks @toobaz nice!

@toobaz toobaz deleted the df_getitem_21294 branch July 8, 2018 08:24
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
…das-dev#21313)

* BUG: fix DataFrame.__getitem__ and .loc with non-list listlikes

close pandas-dev#21294
close pandas-dev#21428
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DataFrame[np.nan] raises TypeError with non-unique columns Dict/dict keys in DataFrame.__getitem__
5 participants